Overview
Brought to you by YData
Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 1133 |
| Missing cells | 237 |
| Missing cells (%) | 1.1% |
| Duplicate rows | 11 |
| Duplicate rows (%) | 1.0% |
| Total size in memory | 177.0 KiB |
| Average record size in memory | 160.0 B |
Variable types
| Numeric | 19 |
|---|
| Dataset has 11 (1.0%) duplicate rows | Duplicates |
GOODS_DESCRIPTION_len_chars_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_mean and 9 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_mean is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 7 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_median is highly overall correlated with GOODS_DESCRIPTION_len_chars_mean and 3 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_min is highly overall correlated with GOODS_DESCRIPTION_len_chars_sum and 5 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_std is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 4 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 9 other fields | High correlation |
GOODS_DESCRIPTION_len_words_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 9 other fields | High correlation |
GOODS_DESCRIPTION_len_words_mean is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 8 other fields | High correlation |
GOODS_DESCRIPTION_len_words_median is highly overall correlated with GOODS_DESCRIPTION_len_chars_mean and 2 other fields | High correlation |
GOODS_DESCRIPTION_len_words_min is highly overall correlated with GOODS_DESCRIPTION_len_chars_min and 5 other fields | High correlation |
GOODS_DESCRIPTION_len_words_std is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 5 other fields | High correlation |
GOODS_DESCRIPTION_len_words_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 8 other fields | High correlation |
HS06_count is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 7 other fields | High correlation |
subtokenization_indicator_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 9 other fields | High correlation |
subtokenization_indicator_mean is highly overall correlated with subtokenization_indicator_max and 2 other fields | High correlation |
subtokenization_indicator_median is highly overall correlated with subtokenization_indicator_mean | High correlation |
subtokenization_indicator_std is highly overall correlated with subtokenization_indicator_max and 2 other fields | High correlation |
subtokenization_indicator_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 8 other fields | High correlation |
GOODS_DESCRIPTION_len_words_std has 79 (7.0%) missing values | Missing |
GOODS_DESCRIPTION_len_chars_std has 79 (7.0%) missing values | Missing |
subtokenization_indicator_std has 79 (7.0%) missing values | Missing |
subtokenization_indicator_std has 19 (1.7%) zeros | Zeros |
Reproduction
| Analysis started | 2025-05-15 18:00:00.012751 |
|---|---|
| Analysis finished | 2025-05-15 18:02:35.771190 |
| Duration | 2 minutes and 35.76 seconds |
| Software version | ydata-profiling vv4.12.1 |
| Download configuration | config.json |
Variables
HS06_count
Real number (ℝ)
High correlation 
| Distinct | 392 |
|---|---|
| Distinct (%) | 34.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 236.34598 |
| Minimum | 1 |
|---|---|
| Maximum | 8524 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 8 |
| median | 40 |
| Q3 | 167 |
| 95-th percentile | 1103 |
| Maximum | 8524 |
| Range | 8523 |
| Interquartile range (IQR) | 159 |
Descriptive statistics
| Standard deviation | 637.71893 |
|---|---|
| Coefficient of variation (CV) | 2.6982431 |
| Kurtosis | 56.199444 |
| Mean | 236.34598 |
| Median Absolute Deviation (MAD) | 37 |
| Skewness | 6.4284881 |
| Sum | 267780 |
| Variance | 406685.43 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 79 | 7.0% |
| 2 | 42 | 3.7% |
| 3 | 36 | 3.2% |
| 4 | 34 | 3.0% |
| 7 | 24 | 2.1% |
| 6 | 24 | 2.1% |
| 8 | 23 | 2.0% |
| 5 | 23 | 2.0% |
| 9 | 22 | 1.9% |
| 13 | 16 | 1.4% |
| Other values (382) | 810 |
| Value | Count | Frequency (%) |
| 1 | 79 | |
| 2 | 42 | |
| 3 | 36 | |
| 4 | 34 | |
| 5 | 23 | 2.0% |
| 6 | 24 | 2.1% |
| 7 | 24 | 2.1% |
| 8 | 23 | 2.0% |
| 9 | 22 | 1.9% |
| 10 | 11 | 1.0% |
| Value | Count | Frequency (%) |
| 8524 | 1 | |
| 7341 | 1 | |
| 5700 | 1 | |
| 5487 | 1 | |
| 4895 | 1 | |
| 4593 | 1 | |
| 3910 | 1 | |
| 3819 | 1 | |
| 3562 | 1 | |
| 3485 | 1 |
GOODS_DESCRIPTION_len_words_sum
Real number (ℝ)
High correlation 
| Distinct | 640 |
|---|---|
| Distinct (%) | 56.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1084.4731 |
| Minimum | 1 |
|---|---|
| Maximum | 42209 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 30 |
| median | 170 |
| Q3 | 735 |
| 95-th percentile | 5179.4 |
| Maximum | 42209 |
| Range | 42208 |
| Interquartile range (IQR) | 705 |
Descriptive statistics
| Standard deviation | 3049.806 |
|---|---|
| Coefficient of variation (CV) | 2.8122469 |
| Kurtosis | 58.849811 |
| Mean | 1084.4731 |
| Median Absolute Deviation (MAD) | 162 |
| Skewness | 6.5871419 |
| Sum | 1228708 |
| Variance | 9301316.7 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 31 | 2.7% |
| 3 | 21 | 1.9% |
| 1 | 18 | 1.6% |
| 6 | 17 | 1.5% |
| 5 | 13 | 1.1% |
| 4 | 12 | 1.1% |
| 7 | 12 | 1.1% |
| 16 | 12 | 1.1% |
| 8 | 12 | 1.1% |
| 10 | 10 | 0.9% |
| Other values (630) | 975 |
| Value | Count | Frequency (%) |
| 1 | 18 | |
| 2 | 31 | |
| 3 | 21 | |
| 4 | 12 | 1.1% |
| 5 | 13 | |
| 6 | 17 | |
| 7 | 12 | 1.1% |
| 8 | 12 | 1.1% |
| 9 | 10 | 0.9% |
| 10 | 10 | 0.9% |
| Value | Count | Frequency (%) |
| 42209 | 1 | |
| 34553 | 1 | |
| 26232 | 1 | |
| 25157 | 1 | |
| 22219 | 1 | |
| 21423 | 1 | |
| 19726 | 1 | |
| 19464 | 1 | |
| 18485 | 1 | |
| 17690 | 1 |
GOODS_DESCRIPTION_len_words_min
Real number (ℝ)
High correlation 
| Distinct | 9 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3857017 |
| Minimum | 1 |
|---|---|
| Maximum | 13 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.85547931 |
|---|---|
| Coefficient of variation (CV) | 0.61736182 |
| Kurtosis | 43.380671 |
| Mean | 1.3857017 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.9802686 |
| Sum | 1570 |
| Variance | 0.73184485 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 820 | |
| 2 | 252 | 22.2% |
| 3 | 33 | 2.9% |
| 4 | 14 | 1.2% |
| 6 | 6 | 0.5% |
| 5 | 5 | 0.4% |
| 13 | 1 | 0.1% |
| 8 | 1 | 0.1% |
| 9 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 820 | |
| 2 | 252 | 22.2% |
| 3 | 33 | 2.9% |
| 4 | 14 | 1.2% |
| 5 | 5 | 0.4% |
| 6 | 6 | 0.5% |
| 8 | 1 | 0.1% |
| 9 | 1 | 0.1% |
| 13 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 13 | 1 | 0.1% |
| 9 | 1 | 0.1% |
| 8 | 1 | 0.1% |
| 6 | 6 | 0.5% |
| 5 | 5 | 0.4% |
| 4 | 14 | 1.2% |
| 3 | 33 | 2.9% |
| 2 | 252 | 22.2% |
| 1 | 820 |
GOODS_DESCRIPTION_len_words_mean
Real number (ℝ)
High correlation 
| Distinct | 801 |
|---|---|
| Distinct (%) | 70.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.0168911 |
| Minimum | 1 |
|---|---|
| Maximum | 13 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3.3225806 |
| median | 4 |
| Q3 | 4.7058824 |
| 95-th percentile | 6 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 1.3833017 |
Descriptive statistics
| Standard deviation | 1.2578939 |
|---|---|
| Coefficient of variation (CV) | 0.3131511 |
| Kurtosis | 3.4291344 |
| Mean | 4.0168911 |
| Median Absolute Deviation (MAD) | 0.69565217 |
| Skewness | 0.65477984 |
| Sum | 4551.1376 |
| Variance | 1.582297 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 44 | 3.9% |
| 3 | 32 | 2.8% |
| 1 | 20 | 1.8% |
| 2.5 | 20 | 1.8% |
| 4 | 17 | 1.5% |
| 5 | 14 | 1.2% |
| 3.5 | 13 | 1.1% |
| 3.666666667 | 12 | 1.1% |
| 4.5 | 10 | 0.9% |
| 3.333333333 | 10 | 0.9% |
| Other values (791) | 941 |
| Value | Count | Frequency (%) |
| 1 | 20 | |
| 1.333333333 | 1 | 0.1% |
| 1.5 | 8 | 0.7% |
| 1.6 | 2 | 0.2% |
| 1.666666667 | 1 | 0.1% |
| 1.736842105 | 1 | 0.1% |
| 1.75 | 3 | 0.3% |
| 1.833333333 | 1 | 0.1% |
| 1.875 | 1 | 0.1% |
| 1.964285714 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 13 | 1 | 0.1% |
| 9.727272727 | 1 | 0.1% |
| 9 | 3 | |
| 8.717391304 | 1 | 0.1% |
| 8.045454545 | 1 | 0.1% |
| 8 | 3 | |
| 7.794871795 | 1 | 0.1% |
| 7.776315789 | 1 | 0.1% |
| 7.690140845 | 1 | 0.1% |
| 7.649667406 | 1 | 0.1% |
GOODS_DESCRIPTION_len_words_median
Real number (ℝ)
High correlation 
| Distinct | 17 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.5141218 |
| Minimum | 1 |
|---|---|
| Maximum | 13 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 5.5 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.1852577 |
|---|---|
| Coefficient of variation (CV) | 0.33728417 |
| Kurtosis | 5.5821362 |
| Mean | 3.5141218 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.2701576 |
| Sum | 3981.5 |
| Variance | 1.4048357 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 4 | 371 | |
| 2 | 130 | 11.5% |
| 5 | 83 | 7.3% |
| 2.5 | 39 | 3.4% |
| 3.5 | 31 | 2.7% |
| 6 | 30 | 2.6% |
| 1 | 23 | 2.0% |
| 4.5 | 14 | 1.2% |
| 7 | 9 | 0.8% |
| Other values (7) | 29 | 2.6% |
| Value | Count | Frequency (%) |
| 1 | 23 | 2.0% |
| 1.5 | 9 | 0.8% |
| 2 | 130 | 11.5% |
| 2.5 | 39 | 3.4% |
| 3 | 374 | |
| 3.5 | 31 | 2.7% |
| 4 | 371 | |
| 4.5 | 14 | 1.2% |
| 5 | 83 | 7.3% |
| 5.5 | 4 | 0.4% |
| Value | Count | Frequency (%) |
| 13 | 1 | 0.1% |
| 9 | 2 | 0.2% |
| 8.5 | 2 | 0.2% |
| 8 | 7 | 0.6% |
| 7 | 9 | 0.8% |
| 6.5 | 4 | 0.4% |
| 6 | 30 | 2.6% |
| 5.5 | 4 | 0.4% |
| 5 | 83 | |
| 4.5 | 14 | 1.2% |
GOODS_DESCRIPTION_len_words_max
Real number (ℝ)
High correlation 
| Distinct | 36 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.319506 |
| Minimum | 1 |
|---|---|
| Maximum | 41 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 7 |
| median | 11 |
| Q3 | 17 |
| 95-th percentile | 26 |
| Maximum | 41 |
| Range | 40 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 7.2390004 |
|---|---|
| Coefficient of variation (CV) | 0.58760477 |
| Kurtosis | -0.1382438 |
| Mean | 12.319506 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.59078136 |
| Sum | 13958 |
| Variance | 52.403126 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11 | 69 | 6.1% |
| 10 | 67 | 5.9% |
| 14 | 60 | 5.3% |
| 12 | 60 | 5.3% |
| 8 | 59 | 5.2% |
| 4 | 58 | 5.1% |
| 5 | 57 | 5.0% |
| 7 | 51 | 4.5% |
| 6 | 50 | 4.4% |
| 3 | 50 | 4.4% |
| Other values (26) | 552 |
| Value | Count | Frequency (%) |
| 1 | 20 | 1.8% |
| 2 | 47 | |
| 3 | 50 | |
| 4 | 58 | |
| 5 | 57 | |
| 6 | 50 | |
| 7 | 51 | |
| 8 | 59 | |
| 9 | 47 | |
| 10 | 67 |
| Value | Count | Frequency (%) |
| 41 | 1 | 0.1% |
| 37 | 2 | 0.2% |
| 34 | 1 | 0.1% |
| 33 | 1 | 0.1% |
| 32 | 3 | 0.3% |
| 31 | 3 | 0.3% |
| 30 | 6 | |
| 29 | 5 | |
| 28 | 12 | |
| 27 | 9 |
GOODS_DESCRIPTION_len_words_std
Real number (ℝ)
High correlation  Missing 
| Distinct | 964 |
|---|---|
| Distinct (%) | 91.5% |
| Missing | 79 |
| Missing (%) | 7.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.3958431 |
| Minimum | 0 |
|---|---|
| Maximum | 8.845903 |
| Zeros | 10 |
| Zeros (%) | 0.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.74051142 |
| Q1 | 1.763664 |
| median | 2.3378609 |
| Q3 | 2.9544606 |
| 95-th percentile | 4.0610509 |
| Maximum | 8.845903 |
| Range | 8.845903 |
| Interquartile range (IQR) | 1.1907966 |
Descriptive statistics
| Standard deviation | 1.0533187 |
|---|---|
| Coefficient of variation (CV) | 0.43964428 |
| Kurtosis | 4.5760451 |
| Mean | 2.3958431 |
| Median Absolute Deviation (MAD) | 0.59579506 |
| Skewness | 1.1140334 |
| Sum | 2525.2187 |
| Variance | 1.1094804 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.7071067812 | 20 | 1.8% |
| 0 | 10 | 0.9% |
| 1.527525232 | 7 | 0.6% |
| 1.414213562 | 6 | 0.5% |
| 0.8164965809 | 6 | 0.5% |
| 0.5773502692 | 6 | 0.5% |
| 1.732050808 | 5 | 0.4% |
| 0.9574271078 | 5 | 0.4% |
| 1 | 5 | 0.4% |
| 2.121320344 | 4 | 0.4% |
| Other values (954) | 980 | |
| (Missing) | 79 | 7.0% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 0.3535533906 | 1 | 0.1% |
| 0.377964473 | 1 | 0.1% |
| 0.4472135955 | 1 | 0.1% |
| 0.4472135955 | 1 | 0.1% |
| 0.5 | 2 | 0.2% |
| 0.5345224838 | 1 | 0.1% |
| 0.5477225575 | 1 | 0.1% |
| 0.5477225575 | 2 | 0.2% |
| 0.5773502692 | 6 |
| Value | Count | Frequency (%) |
| 8.845903006 | 1 | |
| 8.485281374 | 1 | |
| 8.354615002 | 1 | |
| 8.082903769 | 1 | |
| 6.938848539 | 1 | |
| 6.31016496 | 1 | |
| 6.309478886 | 1 | |
| 6.228964601 | 1 | |
| 6.089563178 | 1 | |
| 5.879747322 | 1 |
GOODS_DESCRIPTION_len_chars_sum
Real number (ℝ)
High correlation 
| Distinct | 934 |
|---|---|
| Distinct (%) | 82.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6906.8358 |
| Minimum | 4 |
|---|---|
| Maximum | 284435 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 191 |
| median | 1083 |
| Q3 | 4601 |
| 95-th percentile | 32887 |
| Maximum | 284435 |
| Range | 284431 |
| Interquartile range (IQR) | 4410 |
Descriptive statistics
| Standard deviation | 19697.754 |
|---|---|
| Coefficient of variation (CV) | 2.8519216 |
| Kurtosis | 62.983743 |
| Mean | 6906.8358 |
| Median Absolute Deviation (MAD) | 1031 |
| Skewness | 6.7956587 |
| Sum | 7825445 |
| Variance | 3.8800152 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10 | 7 | 0.6% |
| 14 | 7 | 0.6% |
| 8 | 6 | 0.5% |
| 11 | 6 | 0.5% |
| 30 | 5 | 0.4% |
| 13 | 5 | 0.4% |
| 19 | 5 | 0.4% |
| 24 | 5 | 0.4% |
| 16 | 4 | 0.4% |
| 75 | 4 | 0.4% |
| Other values (924) | 1079 |
| Value | Count | Frequency (%) |
| 4 | 4 | |
| 5 | 4 | |
| 6 | 1 | 0.1% |
| 7 | 1 | 0.1% |
| 8 | 6 | |
| 9 | 3 | |
| 10 | 7 | |
| 11 | 6 | |
| 12 | 2 | 0.2% |
| 13 | 5 |
| Value | Count | Frequency (%) |
| 284435 | 1 | |
| 210370 | 1 | |
| 171533 | 1 | |
| 167072 | 1 | |
| 146538 | 1 | |
| 142074 | 1 | |
| 137014 | 1 | |
| 128603 | 1 | |
| 120507 | 1 | |
| 110031 | 1 |
GOODS_DESCRIPTION_len_chars_min
Real number (ℝ)
High correlation 
| Distinct | 39 |
|---|---|
| Distinct (%) | 3.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.9541041 |
| Minimum | 2 |
|---|---|
| Maximum | 88 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 4 |
| median | 6 |
| Q3 | 10 |
| 95-th percentile | 17 |
| Maximum | 88 |
| Range | 86 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 5.9787324 |
|---|---|
| Coefficient of variation (CV) | 0.75165378 |
| Kurtosis | 37.47767 |
| Mean | 7.9541041 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 4.3789126 |
| Sum | 9012 |
| Variance | 35.745242 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4 | 160 | |
| 6 | 153 | |
| 5 | 143 | |
| 3 | 118 | |
| 7 | 102 | |
| 9 | 82 | |
| 8 | 77 | |
| 10 | 63 | 5.6% |
| 11 | 49 | 4.3% |
| 12 | 30 | 2.6% |
| Other values (29) | 156 |
| Value | Count | Frequency (%) |
| 2 | 14 | 1.2% |
| 3 | 118 | |
| 4 | 160 | |
| 5 | 143 | |
| 6 | 153 | |
| 7 | 102 | |
| 8 | 77 | |
| 9 | 82 | |
| 10 | 63 | 5.6% |
| 11 | 49 | 4.3% |
| Value | Count | Frequency (%) |
| 88 | 1 | |
| 56 | 1 | |
| 50 | 1 | |
| 40 | 2 | |
| 38 | 1 | |
| 37 | 1 | |
| 34 | 1 | |
| 33 | 2 | |
| 32 | 1 | |
| 31 | 2 |
GOODS_DESCRIPTION_len_chars_mean
Real number (ℝ)
High correlation 
| Distinct | 949 |
|---|---|
| Distinct (%) | 83.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.663895 |
| Minimum | 4 |
|---|---|
| Maximum | 88 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 12.06 |
| Q1 | 21.171429 |
| median | 25.594203 |
| Q3 | 30.153846 |
| 95-th percentile | 38.914286 |
| Maximum | 88 |
| Range | 84 |
| Interquartile range (IQR) | 8.9824176 |
Descriptive statistics
| Standard deviation | 8.2451018 |
|---|---|
| Coefficient of variation (CV) | 0.32127242 |
| Kurtosis | 3.9242582 |
| Mean | 25.663895 |
| Median Absolute Deviation (MAD) | 4.4522674 |
| Skewness | 0.68101763 |
| Sum | 29077.193 |
| Variance | 67.981704 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 14 | 10 | 0.9% |
| 24 | 10 | 0.9% |
| 11 | 9 | 0.8% |
| 13 | 8 | 0.7% |
| 22 | 8 | 0.7% |
| 8 | 7 | 0.6% |
| 10 | 7 | 0.6% |
| 28 | 6 | 0.5% |
| 15 | 6 | 0.5% |
| 16 | 5 | 0.4% |
| Other values (939) | 1057 |
| Value | Count | Frequency (%) |
| 4 | 4 | |
| 5 | 5 | |
| 6 | 2 | 0.2% |
| 7 | 2 | 0.2% |
| 7.666666667 | 1 | 0.1% |
| 7.75 | 1 | 0.1% |
| 8 | 7 | |
| 8.5 | 1 | 0.1% |
| 8.75 | 1 | 0.1% |
| 9 | 3 |
| Value | Count | Frequency (%) |
| 88 | 1 | |
| 59.5 | 1 | |
| 58.4439523 | 1 | |
| 57.5 | 1 | |
| 57.45454545 | 1 | |
| 56.22727273 | 1 | |
| 56 | 1 | |
| 53.83333333 | 1 | |
| 52 | 1 | |
| 51.38235294 | 1 |
GOODS_DESCRIPTION_len_chars_median
Real number (ℝ)
High correlation 
| Distinct | 84 |
|---|---|
| Distinct (%) | 7.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.67564 |
| Minimum | 4 |
|---|---|
| Maximum | 88 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 18 |
| median | 22.5 |
| Q3 | 26 |
| 95-th percentile | 35 |
| Maximum | 88 |
| Range | 84 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 7.734842 |
|---|---|
| Coefficient of variation (CV) | 0.34110799 |
| Kurtosis | 7.3585189 |
| Mean | 22.67564 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 1.3802229 |
| Sum | 25691.5 |
| Variance | 59.827781 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 21 | 76 | 6.7% |
| 26 | 67 | 5.9% |
| 24 | 66 | 5.8% |
| 23 | 62 | 5.5% |
| 25 | 58 | 5.1% |
| 22 | 56 | 4.9% |
| 20 | 49 | 4.3% |
| 18 | 46 | 4.1% |
| 19 | 46 | 4.1% |
| 27 | 43 | 3.8% |
| Other values (74) | 564 |
| Value | Count | Frequency (%) |
| 4 | 5 | 0.4% |
| 5 | 5 | 0.4% |
| 6 | 2 | 0.2% |
| 7 | 4 | 0.4% |
| 7.5 | 2 | 0.2% |
| 8 | 6 | 0.5% |
| 8.5 | 1 | 0.1% |
| 9 | 6 | 0.5% |
| 9.5 | 3 | 0.3% |
| 10 | 15 |
| Value | Count | Frequency (%) |
| 88 | 1 | 0.1% |
| 67 | 1 | 0.1% |
| 59.5 | 1 | 0.1% |
| 57.5 | 2 | |
| 56 | 1 | 0.1% |
| 55 | 3 | |
| 52 | 1 | 0.1% |
| 50 | 1 | 0.1% |
| 49 | 1 | 0.1% |
| 48 | 1 | 0.1% |
GOODS_DESCRIPTION_len_chars_max
Real number (ℝ)
High correlation 
| Distinct | 147 |
|---|---|
| Distinct (%) | 13.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 76.0812 |
| Minimum | 4 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 40 |
| median | 76 |
| Q3 | 101 |
| 95-th percentile | 150 |
| Maximum | 150 |
| Range | 146 |
| Interquartile range (IQR) | 61 |
Descriptive statistics
| Standard deviation | 42.120773 |
|---|---|
| Coefficient of variation (CV) | 0.55362918 |
| Kurtosis | -0.95972037 |
| Mean | 76.0812 |
| Median Absolute Deviation (MAD) | 32 |
| Skewness | 0.25987727 |
| Sum | 86200 |
| Variance | 1774.1595 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 150 | 89 | 7.9% |
| 100 | 45 | 4.0% |
| 81 | 30 | 2.6% |
| 149 | 23 | 2.0% |
| 80 | 19 | 1.7% |
| 88 | 19 | 1.7% |
| 26 | 16 | 1.4% |
| 38 | 16 | 1.4% |
| 46 | 14 | 1.2% |
| 99 | 13 | 1.1% |
| Other values (137) | 849 |
| Value | Count | Frequency (%) |
| 4 | 4 | 0.4% |
| 5 | 4 | 0.4% |
| 6 | 2 | 0.2% |
| 7 | 2 | 0.2% |
| 8 | 6 | |
| 9 | 3 | 0.3% |
| 10 | 11 | |
| 11 | 7 | |
| 12 | 4 | 0.4% |
| 13 | 8 |
| Value | Count | Frequency (%) |
| 150 | 89 | |
| 149 | 23 | 2.0% |
| 148 | 5 | 0.4% |
| 147 | 5 | 0.4% |
| 146 | 1 | 0.1% |
| 145 | 2 | 0.2% |
| 144 | 4 | 0.4% |
| 143 | 4 | 0.4% |
| 142 | 3 | 0.3% |
| 141 | 2 | 0.2% |
GOODS_DESCRIPTION_len_chars_std
Real number (ℝ)
High correlation  Missing 
| Distinct | 1032 |
|---|---|
| Distinct (%) | 97.9% |
| Missing | 79 |
| Missing (%) | 7.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.977186 |
| Minimum | 0 |
|---|---|
| Maximum | 50.871453 |
| Zeros | 2 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4.9497475 |
| Q1 | 10.66257 |
| median | 14.598446 |
| Q3 | 18.648641 |
| 95-th percentile | 25.756896 |
| Maximum | 50.871453 |
| Range | 50.871453 |
| Interquartile range (IQR) | 7.9860707 |
Descriptive statistics
| Standard deviation | 6.6164513 |
|---|---|
| Coefficient of variation (CV) | 0.44176866 |
| Kurtosis | 2.7979674 |
| Mean | 14.977186 |
| Median Absolute Deviation (MAD) | 3.9921465 |
| Skewness | 0.92662141 |
| Sum | 15785.954 |
| Variance | 43.777428 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.414213562 | 4 | 0.4% |
| 4.242640687 | 3 | 0.3% |
| 3.535533906 | 3 | 0.3% |
| 0 | 2 | 0.2% |
| 37.4766594 | 2 | 0.2% |
| 6.363961031 | 2 | 0.2% |
| 9.899494937 | 2 | 0.2% |
| 2.828427125 | 2 | 0.2% |
| 0.7071067812 | 2 | 0.2% |
| 4.680252333 | 2 | 0.2% |
| Other values (1022) | 1030 | |
| (Missing) | 79 | 7.0% |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 0.7071067812 | 2 | |
| 1.414213562 | 4 | |
| 1.732050808 | 1 | 0.1% |
| 2.081665999 | 2 | |
| 2.081665999 | 1 | 0.1% |
| 2.121320344 | 2 | |
| 2.309401077 | 1 | 0.1% |
| 2.416461403 | 1 | 0.1% |
| 2.483277404 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 50.87145331 | 1 | |
| 47.63052243 | 1 | |
| 47.38559154 | 1 | |
| 46.41428352 | 1 | |
| 43.40506883 | 1 | |
| 42.02491397 | 1 | |
| 37.4766594 | 2 | |
| 37.03340016 | 1 | |
| 36.92753522 | 1 | |
| 35.61706588 | 1 |
subtokenization_indicator_sum
Real number (ℝ)
High correlation 
| Distinct | 1005 |
|---|---|
| Distinct (%) | 88.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 460.22181 |
| Minimum | 1 |
|---|---|
| Maximum | 16890.965 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 14.75 |
| median | 78.238889 |
| Q3 | 313.48382 |
| 95-th percentile | 2150.1879 |
| Maximum | 16890.965 |
| Range | 16889.965 |
| Interquartile range (IQR) | 298.73382 |
Descriptive statistics
| Standard deviation | 1304.6422 |
|---|---|
| Coefficient of variation (CV) | 2.8348117 |
| Kurtosis | 56.102098 |
| Mean | 460.22181 |
| Median Absolute Deviation (MAD) | 73.638889 |
| Skewness | 6.5643408 |
| Sum | 521431.31 |
| Variance | 1702091.2 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 29 | 2.6% |
| 2 | 27 | 2.4% |
| 3 | 15 | 1.3% |
| 4 | 8 | 0.7% |
| 1.5 | 7 | 0.6% |
| 3.5 | 7 | 0.6% |
| 6 | 7 | 0.6% |
| 4.5 | 5 | 0.4% |
| 5 | 5 | 0.4% |
| 7 | 4 | 0.4% |
| Other values (995) | 1019 |
| Value | Count | Frequency (%) |
| 1 | 29 | |
| 1.125 | 1 | 0.1% |
| 1.2 | 2 | 0.2% |
| 1.25 | 2 | 0.2% |
| 1.333333333 | 1 | 0.1% |
| 1.5 | 7 | 0.6% |
| 1.666666667 | 1 | 0.1% |
| 1.692307692 | 1 | 0.1% |
| 2 | 27 | |
| 2.166666667 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 16890.96539 | 1 | |
| 14580.36125 | 1 | |
| 12428.75132 | 1 | |
| 11086.66911 | 1 | |
| 11050.40079 | 1 | |
| 9379.219336 | 1 | |
| 9249.022564 | 1 | |
| 8523.682765 | 1 | |
| 8400.774335 | 1 | |
| 7634.583305 | 1 |
subtokenization_indicator_min
Real number (ℝ)
| Distinct | 42 |
|---|---|
| Distinct (%) | 3.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.1085662 |
| Minimum | 1 |
|---|---|
| Maximum | 4.3333333 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1.6953846 |
| Maximum | 4.3333333 |
| Range | 3.3333333 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.35336092 |
|---|---|
| Coefficient of variation (CV) | 0.31875491 |
| Kurtosis | 29.617078 |
| Mean | 1.1085662 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.9070055 |
| Sum | 1256.0055 |
| Variance | 0.12486394 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 963 | |
| 1.5 | 30 | 2.6% |
| 2 | 24 | 2.1% |
| 1.333333333 | 20 | 1.8% |
| 1.25 | 18 | 1.6% |
| 1.2 | 7 | 0.6% |
| 1.6 | 7 | 0.6% |
| 1.666666667 | 6 | 0.5% |
| 3 | 5 | 0.4% |
| 1.4 | 4 | 0.4% |
| Other values (32) | 49 | 4.3% |
| Value | Count | Frequency (%) |
| 1 | 963 | |
| 1.125 | 2 | 0.2% |
| 1.142857143 | 3 | 0.3% |
| 1.166666667 | 3 | 0.3% |
| 1.2 | 7 | 0.6% |
| 1.222222222 | 1 | 0.1% |
| 1.25 | 18 | 1.6% |
| 1.285714286 | 1 | 0.1% |
| 1.333333333 | 20 | 1.8% |
| 1.363636364 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 4.333333333 | 1 | 0.1% |
| 4 | 3 | |
| 3.666666667 | 1 | 0.1% |
| 3.6 | 1 | 0.1% |
| 3.5 | 1 | 0.1% |
| 3.333333333 | 1 | 0.1% |
| 3 | 5 | |
| 2.666666667 | 1 | 0.1% |
| 2.571428571 | 1 | 0.1% |
| 2.5 | 3 |
subtokenization_indicator_mean
Real number (ℝ)
High correlation 
| Distinct | 993 |
|---|---|
| Distinct (%) | 87.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8367172 |
| Minimum | 1 |
|---|---|
| Maximum | 5.2916667 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1.1666667 |
| Q1 | 1.5517844 |
| median | 1.7673721 |
| Q3 | 2.0457171 |
| 95-th percentile | 2.7215674 |
| Maximum | 5.2916667 |
| Range | 4.2916667 |
| Interquartile range (IQR) | 0.49393266 |
Descriptive statistics
| Standard deviation | 0.47938043 |
|---|---|
| Coefficient of variation (CV) | 0.2609985 |
| Kurtosis | 5.0398033 |
| Mean | 1.8367172 |
| Median Absolute Deviation (MAD) | 0.23887014 |
| Skewness | 1.4488669 |
| Sum | 2081.0006 |
| Variance | 0.2298056 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 44 | 3.9% |
| 2 | 22 | 1.9% |
| 1.5 | 21 | 1.9% |
| 1.166666667 | 6 | 0.5% |
| 1.25 | 6 | 0.5% |
| 3 | 5 | 0.4% |
| 1.2 | 5 | 0.4% |
| 2.25 | 4 | 0.4% |
| 1.583333333 | 4 | 0.4% |
| 4 | 3 | 0.3% |
| Other values (983) | 1013 |
| Value | Count | Frequency (%) |
| 1 | 44 | |
| 1.055555556 | 1 | 0.1% |
| 1.083333333 | 2 | 0.2% |
| 1.1 | 1 | 0.1% |
| 1.104166667 | 1 | 0.1% |
| 1.111111111 | 1 | 0.1% |
| 1.114583333 | 1 | 0.1% |
| 1.125 | 3 | 0.3% |
| 1.133333333 | 1 | 0.1% |
| 1.145833333 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 5.291666667 | 1 | 0.1% |
| 4.333333333 | 1 | 0.1% |
| 4.179890333 | 1 | 0.1% |
| 4 | 3 | |
| 3.635560676 | 1 | 0.1% |
| 3.6 | 1 | 0.1% |
| 3.543229167 | 1 | 0.1% |
| 3.5 | 1 | 0.1% |
| 3.43871876 | 1 | 0.1% |
| 3.375653614 | 1 | 0.1% |
subtokenization_indicator_median
Real number (ℝ)
High correlation 
| Distinct | 175 |
|---|---|
| Distinct (%) | 15.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.6505672 |
| Minimum | 1 |
|---|---|
| Maximum | 5.25 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1.4 |
| median | 1.5584416 |
| Q3 | 1.85 |
| 95-th percentile | 2.4785714 |
| Maximum | 5.25 |
| Range | 4.25 |
| Interquartile range (IQR) | 0.45 |
Descriptive statistics
| Standard deviation | 0.45717387 |
|---|---|
| Coefficient of variation (CV) | 0.27697986 |
| Kurtosis | 7.026777 |
| Mean | 1.6505672 |
| Median Absolute Deviation (MAD) | 0.22510823 |
| Skewness | 1.7536601 |
| Sum | 1870.0926 |
| Variance | 0.20900795 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.5 | 223 | |
| 2 | 126 | 11.1% |
| 1 | 108 | 9.5% |
| 1.666666667 | 92 | 8.1% |
| 1.333333333 | 58 | 5.1% |
| 1.75 | 45 | 4.0% |
| 1.6 | 26 | 2.3% |
| 1.4 | 22 | 1.9% |
| 1.25 | 21 | 1.9% |
| 1.8 | 18 | 1.6% |
| Other values (165) | 394 |
| Value | Count | Frequency (%) |
| 1 | 108 | |
| 1.0625 | 2 | 0.2% |
| 1.071428571 | 1 | 0.1% |
| 1.083333333 | 3 | 0.3% |
| 1.1 | 3 | 0.3% |
| 1.125 | 6 | 0.5% |
| 1.166666667 | 6 | 0.5% |
| 1.166666667 | 5 | 0.4% |
| 1.171428571 | 1 | 0.1% |
| 1.181818182 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 5.25 | 1 | 0.1% |
| 4.333333333 | 1 | 0.1% |
| 4 | 3 | |
| 3.6 | 1 | 0.1% |
| 3.5 | 1 | 0.1% |
| 3.444444444 | 1 | 0.1% |
| 3.333333333 | 1 | 0.1% |
| 3.291666667 | 1 | 0.1% |
| 3.225 | 1 | 0.1% |
| 3.2 | 1 | 0.1% |
subtokenization_indicator_max
Real number (ℝ)
High correlation 
| Distinct | 153 |
|---|---|
| Distinct (%) | 13.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.0039938 |
| Minimum | 1 |
|---|---|
| Maximum | 59 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1.3333333 |
| Q1 | 3 |
| median | 4.3333333 |
| Q3 | 7 |
| 95-th percentile | 17 |
| Maximum | 59 |
| Range | 58 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 5.2823348 |
|---|---|
| Coefficient of variation (CV) | 0.8798035 |
| Kurtosis | 13.703089 |
| Mean | 6.0039938 |
| Median Absolute Deviation (MAD) | 1.8333333 |
| Skewness | 2.8071111 |
| Sum | 6802.525 |
| Variance | 27.903061 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 99 | 8.7% |
| 4 | 81 | 7.1% |
| 2 | 68 | 6.0% |
| 5 | 65 | 5.7% |
| 6 | 57 | 5.0% |
| 1 | 44 | 3.9% |
| 7 | 43 | 3.8% |
| 9 | 36 | 3.2% |
| 8 | 31 | 2.7% |
| 2.5 | 31 | 2.7% |
| Other values (143) | 578 |
| Value | Count | Frequency (%) |
| 1 | 44 | |
| 1.125 | 1 | 0.1% |
| 1.166666667 | 1 | 0.1% |
| 1.2 | 3 | 0.3% |
| 1.25 | 2 | 0.2% |
| 1.285714286 | 1 | 0.1% |
| 1.3125 | 1 | 0.1% |
| 1.333333333 | 7 | 0.6% |
| 1.4 | 3 | 0.3% |
| 1.5 | 23 |
| Value | Count | Frequency (%) |
| 59 | 1 | 0.1% |
| 38 | 1 | 0.1% |
| 36 | 1 | 0.1% |
| 33 | 1 | 0.1% |
| 32 | 1 | 0.1% |
| 30 | 1 | 0.1% |
| 27 | 2 | |
| 26 | 3 | |
| 25 | 3 | |
| 24 | 2 |
subtokenization_indicator_std
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 1015 |
|---|---|
| Distinct (%) | 96.3% |
| Missing | 79 |
| Missing (%) | 7.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.85537665 |
| Minimum | 0 |
|---|---|
| Maximum | 6.8637026 |
| Zeros | 19 |
| Zeros (%) | 1.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.26341188 |
| Q1 | 0.57251557 |
| median | 0.76675762 |
| Q3 | 1.0013853 |
| 95-th percentile | 1.7524651 |
| Maximum | 6.8637026 |
| Range | 6.8637026 |
| Interquartile range (IQR) | 0.42886972 |
Descriptive statistics
| Standard deviation | 0.54019903 |
|---|---|
| Coefficient of variation (CV) | 0.63153352 |
| Kurtosis | 26.149073 |
| Mean | 0.85537665 |
| Median Absolute Deviation (MAD) | 0.21053179 |
| Skewness | 3.5872379 |
| Sum | 901.56699 |
| Variance | 0.29181499 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 19 | 1.7% |
| 0.7071067812 | 8 | 0.7% |
| 0.3535533906 | 4 | 0.4% |
| 0.2886751346 | 3 | 0.3% |
| 0.5 | 3 | 0.3% |
| 1.060660172 | 2 | 0.2% |
| 0.1443375673 | 2 | 0.2% |
| 0.4714045208 | 2 | 0.2% |
| 0.5773502692 | 2 | 0.2% |
| 0.8260094834 | 2 | 0.2% |
| Other values (1005) | 1007 | |
| (Missing) | 79 | 7.0% |
| Value | Count | Frequency (%) |
| 0 | 19 | |
| 0.04040610178 | 1 | 0.1% |
| 0.04714045208 | 1 | 0.1% |
| 0.1060660172 | 1 | 0.1% |
| 0.1154700538 | 1 | 0.1% |
| 0.1178511302 | 1 | 0.1% |
| 0.1187827742 | 1 | 0.1% |
| 0.1346870059 | 1 | 0.1% |
| 0.1443375673 | 2 | 0.2% |
| 0.1556749623 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 6.863702575 | 1 | |
| 5.543882975 | 1 | |
| 5.023361287 | 1 | |
| 4.317555069 | 1 | |
| 3.406505481 | 1 | |
| 3.287053242 | 1 | |
| 3.184653662 | 1 | |
| 3.09319208 | 1 | |
| 3.02736768 | 1 | |
| 2.924630168 | 1 |
Interactions
Correlations
| GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_words_sum | HS06_count | subtokenization_indicator_max | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_min | subtokenization_indicator_std | subtokenization_indicator_sum | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GOODS_DESCRIPTION_len_chars_max | 1.000 | 0.669 | 0.479 | -0.487 | 0.786 | 0.858 | 0.966 | 0.670 | 0.485 | -0.390 | 0.764 | 0.854 | 0.821 | 0.690 | 0.187 | 0.102 | -0.368 | 0.371 | 0.817 |
| GOODS_DESCRIPTION_len_chars_mean | 0.669 | 1.000 | 0.881 | -0.012 | 0.778 | 0.504 | 0.653 | 0.940 | 0.793 | 0.029 | 0.770 | 0.494 | 0.408 | 0.381 | 0.295 | 0.259 | -0.072 | 0.212 | 0.423 |
| GOODS_DESCRIPTION_len_chars_median | 0.479 | 0.881 | 1.000 | 0.045 | 0.476 | 0.433 | 0.477 | 0.829 | 0.865 | 0.082 | 0.501 | 0.424 | 0.345 | 0.345 | 0.331 | 0.317 | 0.005 | 0.196 | 0.367 |
| GOODS_DESCRIPTION_len_chars_min | -0.487 | -0.012 | 0.045 | 1.000 | -0.199 | -0.627 | -0.498 | -0.066 | -0.002 | 0.732 | -0.202 | -0.632 | -0.668 | -0.534 | -0.053 | 0.017 | 0.437 | -0.288 | -0.658 |
| GOODS_DESCRIPTION_len_chars_std | 0.786 | 0.778 | 0.476 | -0.199 | 1.000 | 0.458 | 0.742 | 0.728 | 0.435 | -0.136 | 0.929 | 0.449 | 0.380 | 0.328 | 0.160 | 0.110 | -0.140 | 0.229 | 0.382 |
| GOODS_DESCRIPTION_len_chars_sum | 0.858 | 0.504 | 0.433 | -0.627 | 0.458 | 1.000 | 0.866 | 0.534 | 0.457 | -0.520 | 0.481 | 0.999 | 0.993 | 0.824 | 0.227 | 0.133 | -0.443 | 0.446 | 0.990 |
| GOODS_DESCRIPTION_len_words_max | 0.966 | 0.653 | 0.477 | -0.498 | 0.742 | 0.866 | 1.000 | 0.687 | 0.492 | -0.397 | 0.787 | 0.866 | 0.831 | 0.697 | 0.176 | 0.094 | -0.375 | 0.370 | 0.826 |
| GOODS_DESCRIPTION_len_words_mean | 0.670 | 0.940 | 0.829 | -0.066 | 0.728 | 0.534 | 0.687 | 1.000 | 0.849 | 0.022 | 0.790 | 0.537 | 0.446 | 0.360 | 0.185 | 0.165 | -0.133 | 0.140 | 0.448 |
| GOODS_DESCRIPTION_len_words_median | 0.485 | 0.793 | 0.865 | -0.002 | 0.435 | 0.457 | 0.492 | 0.849 | 1.000 | 0.073 | 0.480 | 0.460 | 0.381 | 0.312 | 0.193 | 0.186 | -0.069 | 0.113 | 0.387 |
| GOODS_DESCRIPTION_len_words_min | -0.390 | 0.029 | 0.082 | 0.732 | -0.136 | -0.520 | -0.397 | 0.022 | 0.073 | 1.000 | -0.156 | -0.519 | -0.563 | -0.546 | -0.159 | -0.052 | 0.384 | -0.403 | -0.566 |
| GOODS_DESCRIPTION_len_words_std | 0.764 | 0.770 | 0.501 | -0.202 | 0.929 | 0.481 | 0.787 | 0.790 | 0.480 | -0.156 | 1.000 | 0.480 | 0.405 | 0.354 | 0.172 | 0.125 | -0.129 | 0.237 | 0.407 |
| GOODS_DESCRIPTION_len_words_sum | 0.854 | 0.494 | 0.424 | -0.632 | 0.449 | 0.999 | 0.866 | 0.537 | 0.460 | -0.519 | 0.480 | 1.000 | 0.994 | 0.817 | 0.211 | 0.120 | -0.450 | 0.432 | 0.989 |
| HS06_count | 0.821 | 0.408 | 0.345 | -0.668 | 0.380 | 0.993 | 0.831 | 0.446 | 0.381 | -0.563 | 0.405 | 0.994 | 1.000 | 0.825 | 0.204 | 0.108 | -0.466 | 0.444 | 0.995 |
| subtokenization_indicator_max | 0.690 | 0.381 | 0.345 | -0.534 | 0.328 | 0.824 | 0.697 | 0.360 | 0.312 | -0.546 | 0.354 | 0.817 | 0.825 | 1.000 | 0.529 | 0.334 | -0.283 | 0.836 | 0.860 |
| subtokenization_indicator_mean | 0.187 | 0.295 | 0.331 | -0.053 | 0.160 | 0.227 | 0.176 | 0.185 | 0.193 | -0.159 | 0.172 | 0.211 | 0.204 | 0.529 | 1.000 | 0.911 | 0.329 | 0.682 | 0.295 |
| subtokenization_indicator_median | 0.102 | 0.259 | 0.317 | 0.017 | 0.110 | 0.133 | 0.094 | 0.165 | 0.186 | -0.052 | 0.125 | 0.120 | 0.108 | 0.334 | 0.911 | 1.000 | 0.408 | 0.426 | 0.193 |
| subtokenization_indicator_min | -0.368 | -0.072 | 0.005 | 0.437 | -0.140 | -0.443 | -0.375 | -0.133 | -0.069 | 0.384 | -0.129 | -0.450 | -0.466 | -0.283 | 0.329 | 0.408 | 1.000 | -0.140 | -0.422 |
| subtokenization_indicator_std | 0.371 | 0.212 | 0.196 | -0.288 | 0.229 | 0.446 | 0.370 | 0.140 | 0.113 | -0.403 | 0.237 | 0.432 | 0.444 | 0.836 | 0.682 | 0.426 | -0.140 | 1.000 | 0.506 |
| subtokenization_indicator_sum | 0.817 | 0.423 | 0.367 | -0.658 | 0.382 | 0.990 | 0.826 | 0.448 | 0.387 | -0.566 | 0.407 | 0.989 | 0.995 | 0.860 | 0.295 | 0.193 | -0.422 | 0.506 | 1.000 |
Missing values
Sample
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HS04 | |||||||||||||||||||
| 0101 | 7 | 35 | 2 | 5.000000 | 6.0 | 7 | 2.000000 | 203 | 15 | 29.000000 | 31.0 | 46 | 11.401754 | 9.238095 | 1.000000 | 1.319728 | 1.285714 | 1.666667 | 0.255646 |
| 0102 | 10 | 35 | 2 | 3.500000 | 3.0 | 7 | 1.840894 | 179 | 8 | 17.900000 | 15.5 | 34 | 9.757618 | 13.333333 | 1.000000 | 1.333333 | 1.000000 | 2.333333 | 0.496904 |
| 0103 | 2 | 3 | 1 | 1.500000 | 1.5 | 2 | 0.707107 | 20 | 4 | 10.000000 | 10.0 | 16 | 8.485281 | 3.000000 | 1.000000 | 1.500000 | 1.500000 | 2.000000 | 0.707107 |
| 0104 | 3 | 6 | 1 | 2.000000 | 1.0 | 4 | 1.732051 | 33 | 4 | 11.000000 | 9.0 | 20 | 8.185353 | 3.500000 | 1.000000 | 1.166667 | 1.000000 | 1.500000 | 0.288675 |
| 0105 | 76 | 591 | 2 | 7.776316 | 7.0 | 20 | 3.900765 | 3387 | 14 | 44.565789 | 37.0 | 124 | 23.405034 | 107.874584 | 1.000000 | 1.419402 | 1.285714 | 2.687500 | 0.411470 |
| 0106 | 31 | 105 | 1 | 3.387097 | 3.0 | 8 | 1.782833 | 595 | 6 | 19.193548 | 17.0 | 56 | 11.279537 | 44.825000 | 1.000000 | 1.445968 | 1.000000 | 3.000000 | 0.569718 |
| 0201 | 1 | 1 | 1 | 1.000000 | 1.0 | 1 | NaN | 4 | 4 | 4.000000 | 4.0 | 4 | NaN | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | NaN |
| 0202 | 12 | 33 | 1 | 2.750000 | 3.0 | 5 | 1.215431 | 194 | 4 | 16.166667 | 18.0 | 24 | 6.899715 | 21.150000 | 1.000000 | 1.762500 | 1.875000 | 2.500000 | 0.514542 |
| 0203 | 208 | 1298 | 1 | 6.240385 | 6.0 | 14 | 3.067660 | 8216 | 8 | 39.500000 | 39.0 | 89 | 14.113070 | 715.253502 | 1.000000 | 3.438719 | 1.781746 | 33.000000 | 5.543883 |
| 0204 | 4 | 13 | 2 | 3.250000 | 2.5 | 6 | 1.892969 | 82 | 13 | 20.500000 | 15.0 | 39 | 12.476645 | 6.833333 | 1.333333 | 1.708333 | 1.750000 | 2.000000 | 0.343592 |
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HS04 | |||||||||||||||||||
| 9616 | 48 | 135 | 1 | 2.812500 | 3.0 | 6 | 1.024306 | 804 | 4 | 16.750000 | 17.0 | 39 | 6.673224 | 66.600000 | 1.0 | 1.387500 | 1.125000 | 3.500000 | 0.566495 |
| 9617 | 140 | 532 | 1 | 3.800000 | 3.0 | 14 | 1.885957 | 3367 | 3 | 24.050000 | 21.0 | 88 | 12.518375 | 271.683333 | 1.0 | 1.940595 | 1.732143 | 6.333333 | 0.805809 |
| 9618 | 55 | 149 | 1 | 2.709091 | 2.0 | 6 | 1.242350 | 993 | 5 | 18.054545 | 15.0 | 46 | 8.754720 | 99.133333 | 1.0 | 1.802424 | 1.666667 | 5.000000 | 0.783828 |
| 9619 | 270 | 1312 | 1 | 4.859259 | 5.0 | 17 | 2.714858 | 7045 | 4 | 26.092593 | 27.5 | 73 | 12.376180 | 485.317482 | 1.0 | 1.797472 | 1.571429 | 5.000000 | 0.687515 |
| 9620 | 44 | 166 | 1 | 3.772727 | 3.0 | 9 | 1.951310 | 1010 | 6 | 22.954545 | 21.0 | 57 | 11.309503 | 77.407937 | 1.0 | 1.759271 | 1.550000 | 6.500000 | 0.867186 |
| 9701 | 52 | 166 | 1 | 3.192308 | 3.0 | 13 | 2.376671 | 1111 | 6 | 21.365385 | 17.0 | 71 | 14.585823 | 71.359829 | 1.0 | 1.372304 | 1.000000 | 4.000000 | 0.643291 |
| 9702 | 1 | 3 | 3 | 3.000000 | 3.0 | 3 | NaN | 14 | 14 | 14.000000 | 14.0 | 14 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN |
| 9703 | 31 | 100 | 1 | 3.225806 | 2.0 | 12 | 2.261411 | 672 | 9 | 21.677419 | 17.0 | 74 | 14.246373 | 53.633333 | 1.0 | 1.730108 | 1.500000 | 4.000000 | 0.844376 |
| 9704 | 6 | 16 | 1 | 2.666667 | 2.5 | 4 | 1.211060 | 100 | 7 | 16.666667 | 14.0 | 30 | 9.025889 | 7.000000 | 1.0 | 1.166667 | 1.000000 | 1.750000 | 0.302765 |
| 9705 | 4 | 11 | 1 | 2.750000 | 2.0 | 6 | 2.217356 | 75 | 8 | 18.750000 | 15.0 | 37 | 12.632630 | 6.666667 | 1.0 | 1.666667 | 1.500000 | 2.666667 | 0.816497 |
Duplicate rows
Most frequently occurring
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 10 | 10 | 10.0 | 10.0 | 10 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 5 |
| 0 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 4 | 4 | 4.0 | 4.0 | 4 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 4 |
| 1 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 5 | 5 | 5.0 | 5.0 | 5 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 3 |
| 6 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 11 | 11 | 11.0 | 11.0 | 11 | NaN | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | NaN | 3 |
| 2 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 8 | 8 | 8.0 | 8.0 | 8 | NaN | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | NaN | 2 |
| 3 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 9 | 9 | 9.0 | 9.0 | 9 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 2 |
| 4 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 8 | 8 | 8.0 | 8.0 | 8 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 2 |
| 7 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 13 | 13 | 13.0 | 13.0 | 13 | NaN | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | NaN | 2 |
| 8 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 14 | 14 | 14.0 | 14.0 | 14 | NaN | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | NaN | 2 |
| 9 | 1 | 3 | 3 | 3.0 | 3.0 | 3 | NaN | 19 | 19 | 19.0 | 19.0 | 19 | NaN | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | NaN | 2 |